Continuous speech recognition using joint features derived from the modified group delay function and MFCC
نویسندگان
چکیده
Feature extraction and selection for continuous speech recognition is a complex task. State of the art speech recognition systems use features that are derived by ignoring the Fourier transform phase. In our earlier studies we have shown the efficacy of The Modified Group Delay Feature (MODGDF) derived from the Fourier transform phase for phoneme, syllable and speaker recognition. In this paper we use the MODGDF and the popular MFCC derived from Fourier transform magnitude to compute joint features for continuous speech recognition of two Indian languages Tamil and Telugu. A novel method of segmentation of the continuous speech signal into syllable like units followed by isolated style recognition using HMMs is used. We further use an innovative technique which transforms the problem of detecting the correct string of syllabic units with maximum likelihood to finding an optimal state sequence locally. The recognition system does not use any language models. The MODGDF gave promising recognition performance for the two languages and compared well with the MFCC. Joint features derived using MODGDF and MFCC gave a 10.6% improvement for both Tamil and Telugu languages. The improvement reinforces the hypothesis that MODGDF captures complementary information to that of the MFCC and can be used along with the MFCC to capture the complete information in the speech signal at functional level and help in avoiding heavy auditory and language models.
منابع مشابه
Significance of Joint Features Derived from the Modified Group Delay Function in Speech Processing
This paper investigates the significance of combining cepstral features derived from the modified group delay function and from the short-time spectral magnitude like the MFCC. The conventional group delay function fails to capture the resonant structure and the dynamic range of the speech spectrum primarily due to pitch periodicity effects. The group delay function is modified to suppress thes...
متن کاملThe modified group delay feature: a new spectral representation of speech
Automatic recognition of speech by machines begins with extraction of meaningful features from the speech signal. Conventional features like the MFCC are derived from the Fourier transform magnitude spectrum, while totally ignoring the phase spectrum. The importance of the Modified group delay feature (MODGDF) derived from the Fourier transform phase spectrum for speaker and phoneme recognition...
متن کاملThe modified group delay function and its application to phoneme recognition
We explore a new spectral representation of speech signals through group delay functions. The group delay functions by themselves are noisy and difficult to interpret owing to zeroes that are close to the unit circle in the z-domain and these clutter the spectra. A new modified group delay function [1] that reduces the effects of zeroes close to the unit circle is used. Assuming that this new f...
متن کاملCluster and Intrinsic Dimensionality Analysis of the Modified Group Delay Feature for Speaker Classification
Speakers are generally identified by using features derived from the Fourier transform magnitude. The Modified group delay feature(MODGDF) derived from the Fourier transform phase has been used effectively for speaker recognition in our previous efforts.Although the efficacy of the MODGDF as an alternative to the MFCC is yet to be established, it has been shown in our earlier work that composit...
متن کاملUsing group delay functions from all-pole models for speaker recognition
Popular features for speech processing, such as mel-frequency cepstral coefficients (MFCCs), are derived from the short-term magnitude spectrum, whereas the phase spectrum remains unused. While the common argument to use only the magnitude spectrum is that the human ear is phase-deaf, phase-based features have remained less explored due to additional signal processing difficulties they introduc...
متن کامل